Do you know about Philosophy? If you read a lot about them, that’s very cool. Hope you find this article helpful from the perspective of data analysis and visualization. If you are not very familiar with this subject (just as I am), don’t worry. You will get the first impression on each main school of Philosophy.

The first part of the data story is about analysis of the whole corpus of Philosophy. It will bring you an overview of the data set we are currently working on. The second part focuses on sentiment analysis of two interesting schools: Capitalism and Communism.

Part 1: Data set overview. Let’s look at the whole curpus!

Setting up environment

This report is prepared with the following environmental settings.

print(R.version)
               _                           
platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          4                           
minor          0.3                         
year           2020                        
month          10                          
day            10                          
svn rev        79318                       
language       R                           
version.string R version 4.0.3 (2020-10-10)
nickname       Bunny-Wunnies Freak Out     

Load the packages and source files

We don’t need all the packages listed there. They covers the functionality from generating, cleaning the original data set, to data analysis, manipulation, visualization, and sentiment analysis etc.. We mainly focus on the latter part of the functions.

I wrote two functions f.clean.corpus and f.make.tdm in the ../lib/wordcloudFuncs.R since we would generate many word clouds using these wrapped functions rather than writing similar, redundant code.

packages.used=c("rvest", "tibble", 
                "sentimentr", "gplots", "dplyr",
                "tm", "syuzhet", "factoextra", 
                "beeswarm", "scales", "RColorBrewer",
                "RANN", "topicmodels", "stringr", 
                "ggridges", "wordcloud", "wordcloud2",
                "tidytext", "knitr", "tidyverse")
# check packages that need to be installed.
packages.needed=setdiff(packages.used, 
                        intersect(installed.packages()[,1], 
                                  packages.used))
# install additional packages
if(length(packages.needed)>0){
  install.packages(packages.needed, dependencies = TRUE)
}
# load packages
library("rvest")
library("tibble")
library("syuzhet")
library("sentimentr")
library("gplots")
library("dplyr")
library("tm")
library("syuzhet")
library("factoextra")
library("beeswarm")
library("scales")
library("RColorBrewer")
library("RANN")
library("tm")
library("topicmodels")
library("stringr")
library("ggplot2")
#library("ggridges")
#library("viridis")
library("wordcloud")
library("wordcloud2")
library("tidytext")
library("knitr")
library("tidyverse")

#source("../lib/plotstacked.R")
source("../lib/speechFuncs.R")
source("../lib/wordcloudFuncs.R")

Load the dataset

The data set ../data/philosophy_data.csv contains 360808 sentences taken from 51 texts spanning the history of philosophy. You can find more details about it in https://www.kaggle.com/kouroshalizadeh/history-of-philosophy.

file<-'../data/philosophy_data.csv'
full_data <- read.csv(file)
full_data <- full_data %>% 
  mutate(word.count = f.word_count(sentence_spacy))
# sentence.list is generated from the chunk {r generate sentence.list}. This will 
# cost 25-30 minutes. To save your time, you can simply use it.
# Feel free if you don't want to run {r generate sentence.list}
subfile <- '../output/sentence.list.csv'
sentence.list <- read.csv(subfile)

reduced_data = full_data %>% select(title, author, school, sentence_spacy, original_publication_date, sentence_length, word.count)

#trial_data <- reduced_data[1:1000,]
data <- reduced_data %>% filter(school == "capitalism" | school == "communism")

Step : Length of sentences of different schools?

data.byschool <- reduced_data %>% group_by(school) %>%
  summarise(meanlength = mean(word.count)) %>%
  arrange(meanlength)
  
g1 <- data.byschool %>% ggplot(aes(x = reorder(school, meanlength), y = meanlength, fill = school)) + 
  geom_bar(stat = "identity") +
  labs(
    title = "Bar plot: mean length of sentences(word count) of different schools",
    x = "Schools",
    y = "Mean Length of Sentences"
  ) +
  coord_flip()

g1

In addition to the mean length of sentences, this plot will show more details of actual distributon of sentence length.

data.byschool <- reduced_data %>% group_by(school) 
g2 <- data.byschool %>% ggplot(aes(x = school, y = word.count, fill = school)) +
  geom_violin(alpha = 0.5) +
  #geom_point(position = position_jitter(seed = 1, width = 0.2)) +
  theme(legend.position = "none") +
  geom_boxplot(width=.1) +
  labs(
    title = "violin plot + box plot: spread of sentences' length \n of different schools",
    x = "Schools",
    y = "Length of Sentences"
  ) +
  coord_flip()

g2

Step 1: What are the frequently mentioned words?

Let’s have an overview of frequency of words in the whole corpus.

Now let’s see the word clouds of different schools.

Capitalism:

Empiricism:

German_idealism:

Continental:

Rationalism:

Aristotle:

Feminism:

Communism:

Phenomenology:

stoicism:

Analytic:

Nietzsche:

Plato:

Step : sentiment analysis on Capitalism and Communism

names(sentence.list)
 [1] "X.1"                       "X"                         "title"                    
 [4] "author"                    "school"                    "sentence_spacy"           
 [7] "original_publication_date" "sentence_length"           "sentences"                
[10] "anger"                     "anticipation"              "disgust"                  
[13] "fear"                      "joy"                       "sadness"                  
[16] "surprise"                  "trust"                     "negative"                 
[19] "positive"                  "sent.id"                   "word.count"               

Emotionally charged sentences

Capitalism:

emotions.types=c("anticipation", "joy", "surprise", "trust",
                 "anger", "disgust", "fear", "sadness", "negative", "positive")

speech.df=tbl_df(sentence.list) %>%
  filter(school == "capitalism", word.count < 20) %>%
  select(sentences, anger:positive)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
 [1] "This complaint, however, of the scarcity of money, is not always confined to improvident spendthrifts."                                        
 [2] "This money, however, was for a long time, received at the exchequer, by weight, and not by tale."                                              
 [3] "Bankruptcy is, perhaps, the greatest and most humiliating calamity which can befal an innocent man."                                           
 [4] "Poverty, though it no doubt discourages, does not always prevent, marriage."                                                                   
 [5] "The cheapness and plenty of good land encourage improvement, and enable the proprietor to pay those high wages."                               
 [6] "below its standard weight, the bank would, in this case, have lost only one percent."                                                          
 [7] "Its nominal price was a good deal lower than at present."                                                                                      
 [8] "The cheapness and plenty of good land encourage improvement, and enable the proprietor to pay those high wages."                               
 [9] "Not only ignorance and misinformation, but friendship, party animosity, and private resentment, are said frequently to mislead such assessors."
[10] "The cheapness and plenty of good land encourage improvement, and enable the proprietor to pay those high wages."                               

Communism:


speech.df=tbl_df(sentence.list) %>%
  filter(school == "communism", word.count < 20) %>%
  select(sentences, anger:positive)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
 [1] "In actual history it is notorious that conquest, enslavement, robbery, murder, briefly force, play the great part."        
 [2] "The new nobility was the child of its time, for which money was the power of all powers."                                  
 [3] "Or perhaps Bastiat means, that a mode of production based on slavery is based on a system of plunder."                     
 [4] "In actual history it is notorious that conquest, enslavement, robbery, murder, briefly force, play the great part."        
 [5] "the rich grow rapidly richer, whilst there is no perceptible advance in the comfort enjoyed by the industrial classes."    
 [6] "One old woman was burnt to death in the flames of the hut, which she refused to leave."                                    
 [7] "Then one use value is just as good as another, provided only it be present in sufficient quantity."                        
 [8] "Dr. Simon, medical officer to the Privy Council, chose for this work the above mentioned Dr. Smith."                       
 [9] "In actual history it is notorious that conquest, enslavement, robbery, murder, briefly force, play the great part."        
[10] "In order to be fully prepared for his task, the working class revolutionary must also become a professional revolutionary."

Clustering of emotions

Capitalism:

heatmap.2(cor(sentence.list%>%filter(school=="capitalism")%>%select(anger:positive)), 
          scale = "none", 
          col = bluered(100), , margin=c(6, 6), key=F,
          trace = "none", density.info = "none")
par(mar=c(4, 6, 2, 1))

Communism:

heatmap.2(cor(sentence.list%>%filter(school=="communism")%>%select(anger:positive)), 
          scale = "none", 
          col = bluered(100), , margin=c(6, 6), key=F,
          trace = "none", density.info = "none")
par(mar=c(4, 6, 2, 1))

sentence.list.capitalism <- sentence.list %>% filter(school=="capitalism")
emo.means1=colMeans(select(sentence.list.capitalism, anger:positive)>0.01)
col.use1=c("red2", "darkgoldenrod1", "chartreuse3","blueviolet", "darkgoldenrod1", "dodgerblue3", "darkgoldenrod1", "darkgoldenrod1", "black", "darkgoldenrod1")
barplot(emo.means1[order(emo.means1)], las=2, col=col.use1[order(emo.means1)], horiz=T,
        cex.names=0.7, main="Capitalism")

sentiment.df1 <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  value=emo.means1
)

sentiment.df1 <- sentiment.df1 %>% 
  mutate(perc = value / sum(value)) %>% 
  arrange(perc) %>%
  mutate(labels = scales::percent(perc))


ggplot(sentiment.df1, aes(x="", y=perc, fill=group)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  geom_text(aes(label = labels), 
            position = position_stack(vjust = 0.5),
            show.legend = FALSE,
            color = "white", size=3) +
  scale_fill_brewer(palette="Spectral") +
  labs(
    title = "Percentage of the sentiment in Capitalism",
    x = "",
    y = "",
  )

sentence.list.communism <- sentence.list %>% filter(school=="communism")
emo.means2=colMeans(select(sentence.list.communism, anger:positive)>0.01)
col.use2=c("red2", "darkgoldenrod1", "chartreuse3","blueviolet", "darkgoldenrod1", "dodgerblue3", "darkgoldenrod1", "darkgoldenrod1", "black", "darkgoldenrod1")
barplot(emo.means2[order(emo.means2)], las=2, col=col.use2[order(emo.means2)], horiz=T,
        cex.names=0.7, main="Communism")

sentiment.df2 <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  value=emo.means2
)

sentiment.df2 <- sentiment.df2 %>% 
  mutate(perc = value / sum(value)) %>% 
  arrange(perc) %>%
  mutate(labels = scales::percent(perc))


ggplot(sentiment.df2, aes(x="", y=perc, fill=group)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  geom_text(aes(label = labels), 
            position = position_stack(vjust = 0.5),
            show.legend = FALSE,
            color = "white", size=3) +
  scale_fill_brewer(palette="Spectral") +
  labs(
    title = "Percentage of the sentiment in Communism",
    x = "",
    y = "",
  )

sentiment.df.all <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  capitalism=emo.means1,
  communism=emo.means2
)

sentiment.df.all.long <- sentiment.df.all %>% pivot_longer(c("capitalism", "communism"))

g3 <- sentiment.df.all.long %>% ggplot(aes(x = group, y = value, fill = name)) +
  geom_col(position = "dodge") +
  labs(
    x = "",
    y = "mean score of sentiment",
    title = "Sentiment comparison: Capitalism vs. Communism"
  )

g3

sentence.list.Marx <- sentence.list %>% filter(author=="Marx")
emo.means.Marx=colMeans(select(sentence.list.Marx, anger:positive)>0.01)

sentence.list.Lenin <- sentence.list %>% filter(author=="Lenin")
emo.means.Lenin=colMeans(select(sentence.list.Lenin, anger:positive)>0.01)

sentence.list.Smith <- sentence.list %>% filter(author=="Smith")
emo.means.Smith=colMeans(select(sentence.list.Smith, anger:positive)>0.01)

sentence.list.Ricardo <- sentence.list %>% filter(author=="Ricardo")
emo.means.Ricardo=colMeans(select(sentence.list.Ricardo, anger:positive)>0.01)

sentence.list.Keynes <- sentence.list %>% filter(author=="Keynes")
emo.means.Keynes=colMeans(select(sentence.list.Keynes, anger:positive)>0.01)

sentiment.df.byauthor <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  Marx=emo.means.Marx,
  Lenin=emo.means.Lenin,
  Smith=emo.means.Smith,
  Ricardo=emo.means.Ricardo,
  Keynes=emo.means.Keynes
)

sentiment.df.byauthor.long <- sentiment.df.byauthor %>% pivot_longer(c("Marx", "Lenin", "Smith", "Ricardo", "Keynes" ))

g4 <- sentiment.df.byauthor.long %>% ggplot(aes(x = group, y = value, fill = name)) +
  geom_col(position = "dodge") +
  labs(
    x = "",
    y = "mean score of sentiment",
    title = "Sentiment comparison: all the authors",
    subtitle = "Communism: Marx, Lenin \nCapitalism: Keynes, Ricardo, Smith"
  )

g4

---
title: "Investigating different schools of Philosophy--Capitalism vs. Communism"
output: html_notebook
---

Do you know about Philosophy? If you read a lot about them, that's very cool. Hope you find this article helpful from the perspective of data analysis and visualization. If you are not very familiar with this subject (just as I am), don't worry. You will get the first impression on each main school of Philosophy. 

The first part of the data story is about analysis of the whole corpus of Philosophy. It will bring you an overview of the data set we are currently working on. The second part focuses on sentiment analysis of two interesting schools: Capitalism and Communism. 

# Part 1: Data set overview. Let's look at the whole curpus!

## Setting up environment

```{r setup, warning=FALSE, message=FALSE,echo=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This report is prepared with the following environmental settings.

```{r}
print(R.version)
```

## Load the packages and source files
We don't need all the packages listed there. They covers the functionality from generating, cleaning the original data set, to data analysis, manipulation, visualization, and sentiment analysis etc.. We mainly focus on the latter part of the functions.

I wrote two functions **f.clean.corpus** and **f.make.tdm** in the *../lib/wordcloudFuncs.R* since we would generate many word clouds using these wrapped functions rather than writing similar, redundant code. 

```{r, message=FALSE, warning=FALSE}
packages.used=c("rvest", "tibble", 
                "sentimentr", "gplots", "dplyr",
                "tm", "syuzhet", "factoextra", 
                "beeswarm", "scales", "RColorBrewer",
                "RANN", "topicmodels", "stringr", 
                "ggridges", "wordcloud", "wordcloud2",
                "tidytext", "knitr", "tidyverse")
# check packages that need to be installed.
packages.needed=setdiff(packages.used, 
                        intersect(installed.packages()[,1], 
                                  packages.used))
# install additional packages
if(length(packages.needed)>0){
  install.packages(packages.needed, dependencies = TRUE)
}
# load packages
library("rvest")
library("tibble")
library("syuzhet")
library("sentimentr")
library("gplots")
library("dplyr")
library("tm")
library("syuzhet")
library("factoextra")
library("beeswarm")
library("scales")
library("RColorBrewer")
library("RANN")
library("tm")
library("topicmodels")
library("stringr")
library("ggplot2")
#library("ggridges")
#library("viridis")
library("wordcloud")
library("wordcloud2")
library("tidytext")
library("knitr")
library("tidyverse")

#source("../lib/plotstacked.R")
source("../lib/speechFuncs.R")
source("../lib/wordcloudFuncs.R")

```

## Load the dataset
The data set *../data/philosophy_data.csv* contains 360808 sentences taken from 51 texts spanning the history of philosophy. You can find more details about it in [https://www.kaggle.com/kouroshalizadeh/history-of-philosophy](https://www.kaggle.com/kouroshalizadeh/history-of-philosophy). 

```{r read data, warning=FALSE, message=FALSE}
file<-'../data/philosophy_data.csv'
full_data <- read.csv(file)
full_data <- full_data %>% 
  mutate(word.count = f.word_count(sentence_spacy))
# sentence.list is generated from the chunk {r generate sentence.list}. This will 
# cost 25-30 minutes. To save your time, you can simply use it.
# Feel free if you don't want to run {r generate sentence.list}
subfile <- '../output/sentence.list.csv'
sentence.list <- read.csv(subfile)
```


```{r ,warning=FALSE, message=FALSE}

reduced_data = full_data %>% select(title, author, school, sentence_spacy, original_publication_date, sentence_length, word.count)

#trial_data <- reduced_data[1:1000,]
data <- reduced_data %>% filter(school == "capitalism" | school == "communism")
```


# Step : Length of sentences of different schools?

```{r ,warning=FALSE, message=FALSE}
data.byschool <- reduced_data %>% group_by(school) %>%
  summarise(meanlength = mean(word.count)) %>%
  arrange(meanlength)
  
g1 <- data.byschool %>% ggplot(aes(x = reorder(school, meanlength), y = meanlength, fill = school)) + 
  geom_bar(stat = "identity") +
  labs(
    title = "Bar plot: mean length of sentences(word count) of different schools",
    x = "Schools",
    y = "Mean Length of Sentences"
  ) +
  coord_flip()

g1
```

In addition to the mean length of sentences, this plot will show more details of actual distributon of sentence length.

```{r, fig.width = 3, fig.height = 3}
data.byschool <- reduced_data %>% group_by(school) 
g2 <- data.byschool %>% ggplot(aes(x = school, y = word.count, fill = school)) +
  geom_violin(alpha = 0.5) +
  #geom_point(position = position_jitter(seed = 1, width = 0.2)) +
  theme(legend.position = "none") +
  geom_boxplot(width=.1) +
  labs(
    title = "violin plot + box plot: spread of sentences' length \n of different schools",
    x = "Schools",
    y = "Length of Sentences"
  ) +
  coord_flip()

g2
```




# Step 1: What are the frequently mentioned words?
Let's have an overview of frequency of words in the whole corpus.



```{r prepare corpus, warning=FALSE, message=FALSE,echo=FALSE}
text <- full_data$sentence_lowered
docs <- Corpus(VectorSource(text))
```

```{r clean corpus and make dtm, warning=FALSE, message=FALSE,echo=FALSE}
rm_words = c('also', 'areas', 'can', 'etc', 'get', 'just', 'like',
'lot', 'many', 'may', 'need', 'one', 's', 'set', 't',
'time', 'us', 'use', 'way', 'well', 'will', 'b', 'e',
'g', 'less', 'give', 'tell', 'im', 'take', 'coming',
'say', 'really', 'must')

docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```



```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,0.8),
          max.words=200,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(8,"Dark2"))
```

### Now let's see the word clouds of different schools.
Capitalism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.capitalism <- full_data %>% filter(school == "capitalism")
text.capitalism <- text.capitalism$sentence_lowered
docs <- Corpus(VectorSource(text.capitalism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```
Empiricism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.empiricism <- full_data %>% filter(school == "empiricism")
text.empiricism <- text.empiricism$sentence_lowered
docs <- Corpus(VectorSource(text.empiricism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

German_idealism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.german_idealism <- full_data %>% filter(school == "german_idealism")
text.german_idealism <- text.german_idealism$sentence_lowered
docs <- Corpus(VectorSource(text.german_idealism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```
Continental:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.continental <- full_data %>% filter(school == "continental")
text.continental <- text.continental$sentence_lowered
docs <- Corpus(VectorSource(text.continental))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```


Rationalism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.rationalism <- full_data %>% filter(school == "rationalism")
text.rationalism <- text.rationalism$sentence_lowered
docs <- Corpus(VectorSource(text.rationalism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Aristotle:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.aristotle <- full_data %>% filter(school == "aristotle")
text.aristotle <- text.aristotle$sentence_lowered
docs <- Corpus(VectorSource(text.aristotle))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Feminism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.feminism <- full_data %>% filter(school == "feminism")
text.feminism <- text.feminism$sentence_lowered
docs <- Corpus(VectorSource(text.feminism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Communism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.communism <- full_data %>% filter(school == "communism")
text.communism <- text.communism$sentence_lowered
docs <- Corpus(VectorSource(text.communism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Phenomenology: 

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.phenomenology <- full_data %>% filter(school == "phenomenology")
text.phenomenology <- text.phenomenology$sentence_lowered
docs <- Corpus(VectorSource(text.phenomenology))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

stoicism:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.stoicism <- full_data %>% filter(school == "stoicism")
text.stoicism <- text.stoicism$sentence_lowered
docs <- Corpus(VectorSource(text.stoicism))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Analytic:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.analytic <- full_data %>% filter(school == "analytic")
text.analytic <- text.analytic$sentence_lowered
docs <- Corpus(VectorSource(text.analytic))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Nietzsche:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.nietzsche <- full_data %>% filter(school == "nietzsche")
text.nietzsche <- text.nietzsche$sentence_lowered
docs <- Corpus(VectorSource(text.nietzsche))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```

Plato:

```{r, warning=FALSE, message=FALSE,echo=FALSE}
text.plato <- full_data %>% filter(school == "plato")
text.plato <- text.plato$sentence_lowered
docs <- Corpus(VectorSource(text.plato))
docs <- f.clean.corpus(docs)
tdm.overall <- f.make.tdm(docs)
```


```{r, fig.height=6, fig.width=6, warning=FALSE, message=FALSE,echo=FALSE}
wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,1.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
```





# Step : sentiment analysis on Capitalism and Communism

```{r generate sentence.list, warning=FALSE, message=FALSE,echo=FALSE}
# sentence.list=NULL
# 
# for(i in 1:nrow(data)){
#   sentences <- data$sentence_spacy[i]
#   if(length(sentences)>0){
#     emotions=get_nrc_sentiment(sentences)
#     #word.count=word_count(sentences)
#     # colnames(emotions)=paste0("emo.", colnames(emotions))
#     # in case the word counts are zeros?
#     #emotions=diag(1/(trial_data$sentence_length[1]+0.01))%*%as.matrix(emotions)
#     emotions = as.matrix(emotions)
#     sentence.list=rbind(sentence.list,
#                         cbind(data[i,],
#                               sentences=as.character(sentences),
#                               #word.count,
#                               emotions,
#                               sent.id=i
#                               )
#     )
#   }
# }
#
# sentence.list <- sentence.list %>% left_join(data) #join the word.count into sentence.list
#write.csv(sentence.list,"/Users/jiuruwang/Documents/GitHub/spring-2022-prj1-jrwang0810/output/sentence.list.csv")
```





```{r, warning=FALSE, message=FALSE}
names(sentence.list)
```

## Emotionally charged sentences

Capitalism:
```{r}
emotions.types=c("anticipation", "joy", "surprise", "trust",
                 "anger", "disgust", "fear", "sadness", "negative", "positive")

speech.df=tbl_df(sentence.list) %>%
  filter(school == "capitalism", word.count < 20) %>%
  select(sentences, anger:positive)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
```

Communism:
```{r}

speech.df=tbl_df(sentence.list) %>%
  filter(school == "communism", word.count < 20) %>%
  select(sentences, anger:positive)
speech.df=as.data.frame(speech.df)
as.character(speech.df$sentences[apply(speech.df[,-1], 2, which.max)])
```

## Clustering of emotions

Capitalism:
```{r}
heatmap.2(cor(sentence.list%>%filter(school=="capitalism")%>%select(anger:positive)), 
          scale = "none", 
          col = bluered(100), , margin=c(6, 6), key=F,
          trace = "none", density.info = "none")
par(mar=c(4, 6, 2, 1))
```

Communism:
```{r}
heatmap.2(cor(sentence.list%>%filter(school=="communism")%>%select(anger:positive)), 
          scale = "none", 
          col = bluered(100), , margin=c(6, 6), key=F,
          trace = "none", density.info = "none")
par(mar=c(4, 6, 2, 1))
```



```{r}
sentence.list.capitalism <- sentence.list %>% filter(school=="capitalism")
emo.means1=colMeans(select(sentence.list.capitalism, anger:positive)>0.01)
col.use1=c("red2", "darkgoldenrod1", "chartreuse3","blueviolet", "darkgoldenrod1", "dodgerblue3", "darkgoldenrod1", "darkgoldenrod1", "black", "darkgoldenrod1")
barplot(emo.means1[order(emo.means1)], las=2, col=col.use1[order(emo.means1)], horiz=T,
        cex.names=0.7, main="Capitalism")
```

```{r}
sentiment.df1 <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  value=emo.means1
)

sentiment.df1 <- sentiment.df1 %>% 
  mutate(perc = value / sum(value)) %>% 
  arrange(perc) %>%
  mutate(labels = scales::percent(perc))


ggplot(sentiment.df1, aes(x="", y=perc, fill=group)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  geom_text(aes(label = labels), 
            position = position_stack(vjust = 0.5),
            show.legend = FALSE,
            color = "white", size=3) +
  scale_fill_brewer(palette="Spectral") +
  labs(
    title = "Percentage of the sentiment in Capitalism",
    x = "",
    y = "",
  )
```





```{r}
sentence.list.communism <- sentence.list %>% filter(school=="communism")
emo.means2=colMeans(select(sentence.list.communism, anger:positive)>0.01)
col.use2=c("red2", "darkgoldenrod1", "chartreuse3","blueviolet", "darkgoldenrod1", "dodgerblue3", "darkgoldenrod1", "darkgoldenrod1", "black", "darkgoldenrod1")
barplot(emo.means2[order(emo.means2)], las=2, col=col.use2[order(emo.means2)], horiz=T,
        cex.names=0.7, main="Communism")
```


```{r}
sentiment.df2 <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  value=emo.means2
)

sentiment.df2 <- sentiment.df2 %>% 
  mutate(perc = value / sum(value)) %>% 
  arrange(perc) %>%
  mutate(labels = scales::percent(perc))


ggplot(sentiment.df2, aes(x="", y=perc, fill=group)) +
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  geom_text(aes(label = labels), 
            position = position_stack(vjust = 0.5),
            show.legend = FALSE,
            color = "white", size=3) +
  scale_fill_brewer(palette="Spectral") +
  labs(
    title = "Percentage of the sentiment in Communism",
    x = "",
    y = "",
  )
```




```{r}
sentiment.df.all <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  capitalism=emo.means1,
  communism=emo.means2
)

sentiment.df.all.long <- sentiment.df.all %>% pivot_longer(c("capitalism", "communism"))

g3 <- sentiment.df.all.long %>% ggplot(aes(x = group, y = value, fill = name)) +
  geom_col(position = "dodge") +
  labs(
    x = "",
    y = "mean score of sentiment",
    title = "Sentiment comparison: Capitalism vs. Communism"
  )

g3
```



```{r}
sentence.list.Marx <- sentence.list %>% filter(author=="Marx")
emo.means.Marx=colMeans(select(sentence.list.Marx, anger:positive)>0.01)

sentence.list.Lenin <- sentence.list %>% filter(author=="Lenin")
emo.means.Lenin=colMeans(select(sentence.list.Lenin, anger:positive)>0.01)

sentence.list.Smith <- sentence.list %>% filter(author=="Smith")
emo.means.Smith=colMeans(select(sentence.list.Smith, anger:positive)>0.01)

sentence.list.Ricardo <- sentence.list %>% filter(author=="Ricardo")
emo.means.Ricardo=colMeans(select(sentence.list.Ricardo, anger:positive)>0.01)

sentence.list.Keynes <- sentence.list %>% filter(author=="Keynes")
emo.means.Keynes=colMeans(select(sentence.list.Keynes, anger:positive)>0.01)

sentiment.df.byauthor <- data.frame(
  group=c("anger", "anticipation", "disgust","fear", "joy", "sadness", "surprise", "trust", "negative", "positive"),
  Marx=emo.means.Marx,
  Lenin=emo.means.Lenin,
  Smith=emo.means.Smith,
  Ricardo=emo.means.Ricardo,
  Keynes=emo.means.Keynes
)

sentiment.df.byauthor.long <- sentiment.df.byauthor %>% pivot_longer(c("Marx", "Lenin", "Smith", "Ricardo", "Keynes" ))

g4 <- sentiment.df.byauthor.long %>% ggplot(aes(x = group, y = value, fill = name)) +
  geom_col(position = "dodge") +
  labs(
    x = "",
    y = "mean score of sentiment",
    title = "Sentiment comparison: all the authors",
    subtitle = "Communism: Marx, Lenin \nCapitalism: Keynes, Ricardo, Smith"
  )

g4

```



